AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Efficient Visual Question Answering

# Efficient Visual Question Answering

Qwen2.5 VL 3B Instruct GPTQ Int3
Apache-2.0
The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.
Image-to-Text Transformers Supports Multiple Languages
Q
hfl
60
1
Nanollava 1.5
Apache-2.0
nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.
Image-to-Text Transformers English
N
qnguyen3
442
109
Imp V1.5 4B Phi3
Apache-2.0
Imp-v1.5-4B-Phi3 is a high-performance lightweight multimodal large model with only 4 billion parameters, built on the Phi-3 framework and SigLIP visual encoder.
Text-to-Image Transformers
I
MILVLG
140
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase